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Identification of causal structures and quantification of direct information flows in complex sys¬ 
tems is a challenging yet important task, with practical applications in many fields. Data generated 
by dynamical processes or large-scale systems are often symbolized, either because of the finite reso¬ 
lution of the measurement apparatus, or because of the need of statistical estimation. By algorithmic 
application of causation entropy, we investigated the effects of symbolization on important concepts 
such as Markov order and causal structure of the tent map. We uncovered that these quantities 
depend nonmontonically and, most of all, sensitively on the choice of symbolization. Indeed, we 
show that Markov order and causal structure do not necessarily converge to their original analog 
counterparts as the resolution of the partitioning becomes finer. 

PACS numbers: Causal Structure (04.20.Gz), Entropy (65.40.gd), Information Theory (87.19.1o), Markov 
Processes (02.50.Ga) 


While quantitative description and understanding of natural phenomena is at the core of science, 
inference of cause-and-effect relationships from measured data is a central problem in the study of 
complex systems, with many important practical applications. For example, knowing “what causes 
what” allows for the effective identification of the cause of a medical disease or disorder, and for the 
detection of the root source of defects of engineering systems. However, the act of measuring the 
states of a dynamical system mediates the inference of cause-and-effect relationships. For instance, all 
observation procedures carry the limitation of finite precision. A common example is the binning of 
data (histograms) into discrete symbols. In this paper we use a toy mathematical model to show that 
such digitization (symbolization) may lead to inferred causal relationships that differ significantly from 
those of the original system, even when the amount of data is unlimited. Although based on a simple 
mathematical model, our results shed new light on the challenging nature of causality inference. 
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I. INTRODUCTION 

Uncovering cause-and-effect relationships remains an exciting challenge in many fields of applied science. For 
instance, identifying the causes of a disease in order to prescribe effective treatments is of primary importance in 
medical diagnosis pQ; locating the defects that could cause abrupt changes of the connectivity structure and adversely 
affect the performance of the system is a main objective in structural health monitoring n®. Consequently, the 
problem of inferring causal relationships from observational data has attracted much attention in recent years HHH]. 

Identifying causal relationships in large-scale complex systems turns out to be a highly nontrivial task. As a matter 
of fact, a reliable test of causal relationships requires the effective determination of whether the cause-and-effect is real 
or is due to the secondary influence of other variables in the system. This, in principle, can be achieved by testing the 
relative independence between the potential cause and effect conditioned on all other variables in the system. Such 
a method essentially demands the estimation of joint probabilities for (very) high dimensional variables from limited 
available data and suffers the curse of dimensionality. In practice, there are various approaches in statistics and 
information theory that aim at accomplishing the proper conditioning without the need of testing upon all remaining 
variables of the system at once El Eg. The basic idea behind many such approaches originates from the classical 
PC-algorithm [19] . which repeatedly measures the relative independence between the cause and effect conditioned on 
combinations of the other variables. As an alternative, we recently developed a new entropy-based computational 
approach that infers the causal structure via a two-stage process, by first aggregatively discovering potential causal 
relationships and then progressively removing those (from the stage) that are redundant [15U17) . 

In almost all computational approaches for inferring causal structure, it is necessary to estimate the joint proba¬ 
bilities underlying the given process. Large-scale data sets are commonly analyzed via discretization procedures, for 
instance using binning, ranking, and/or permutation methods [2014251 . These methods generally require fine-tuning 
of parameters and can be sensitive to noise. On the other hand, the time-evolution of a physical system can only 
be measured and recorded to a finite precision, resembling an approximation of the true underlying process. This 
finite resolution can be characterized by means of a finite set of symbols, yielding a discretization of the phase space. 
Regardless of the nature and motivation of discretization, the precise impacts on the causal structure of the system 
is essentially unexplored. Here, we investigate the symbolic description of a dynamical system and how it affects the 
resulting Markov order and causal structures. Such description, based on partitioning the phase space of the system, 
is also commonly known as symbolization. Symbolization converts the original dynamics into a stochastic process sup¬ 
ported on a finite sample space. Focusing on the tent map for the simplicity, clarity and completeness of computation 
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it allows [2B] , we introduce numerical procedures to compute the joint probabilities of the stochastic process resulting 
from arbitrary partitioning of the phase space. Furthermore, we develop causation entropy, an information-theoretic 
measure based on conditional mutual information as a mean to determine the Markov order and (temporal) causal 
structure of such processes. We uncovered that a partitioning that maintains dynamic invariants of the system does 
not necessarily preserve its causal structure. On the other hand, both the Markov order and causal structure depend 
nonmonotonically and, indeed, sensitively on the partitioning. 

II. PHASE SPACE PARTITIONING AND SYMBOLIC DYNAMICS 

A powerful method of analyzing nonlinear dynamical systems is to study their symbolic dynamics through some 
topological partition of the phase space PB1 - E5] . The main idea characterizing symbolic dynamics is to represent the 
state of the system using symbols from a finite alphabet defined by the partition, rather than using a continuous 
variable of the original phase space. For more details, we refer to P§lf521 . The issue of partitioning was shown to 
affect entropic computations in a nontrivial manner [551 1 51] and, as we will highlight in the paper, is also intricate 
and central to a general information-theoretic description of the system. 

A. Partition of the phase space and symbolic dynamics 

Consider a discrete dynamical system given by 

x t+ i = f(x t ), (1) 

where Xt £ M C represents the state of the system at time t and the vector field / : M —> M governs the dynamic 
evolution of the states. A ( topological ) partition of the phase space M is a finite collection A = f {Ao,..., A m } of 
disjoint open sets whose closures cover M, i.e., 

m 

M=\JA 1 . (2) 

2=0 

The partition leads to the corresponding symbolic dynamics. In particular, for any trajectory {xq,X\,X 2 , • • • } of the 
original dynamics contained in the union of Aj’s, the partition yields a symbol sequence {so, Si, S 2 ,... } given by 

m 

St ='52xA i (x t ) ■ i, 
i =0 


( 3 ) 
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(Xo, Xi, X 2 , Xj) -► (So, Si, S 2 , S3>=(2, 0, 0, 1) 

FIG. 1: Schematic illustration of partitioning the phase space and the resulting symbolic dynamics. Given the partitioning 
A = {Ao, Ai, A 2 }, the trajectory ( xo , xi, * 2 , * 3 , ■ • ■) leads to a symbol sequence (so, si, s 2 , S 3 ,...) = (2, 0, 0,1,...). 

where xa is the indicator function defined as 

{ 1, if a; £ A , 

( 4 ) 

0, if x £ A. 

In other words, the symbolic state s t is determined by the open set A, that contains the state x t . See Fig. [l]for a 
schematic illustration. 

In general, the same symbol sequence may result from distinct trajectories. If the partition is generating , then every 
symbol sequence corresponds to a unique trajectory [33. A special case is the so-called Markov partition [571ES], for 
which the transition from one symbolic state to another is independent of past states, analogous to a Markov process. 
On the other hand, a generating partition is not necessarily Markov GElECZj. 

The precise effects of partitioning on the symbolic dynamics remains an interesting and challenging problem, 
with recent progress in a few directions. Focusing on the equivalence between the original and symbolic dynamics, 
Bollt et. al. studied the consequence of misplaced partitions on dynamical invariants I.{.‘l I l34j , while Teramoto and 
Komatsuzaki investigated topological change in the symbolic dynamics upon different choices of Markov partitions [JH|. 
On the other hand, the degree of self-sufficiency of the symbolic dynamics, irrespective of the equivalence to the original 
dynamics, has started to gain increasing interest, focusing on information-theoretical measures such as information 
closure and prediction efficiency [39] . We here adopt a different perspective and study how causal structures emerge 
and/or change under different choices of partitioning. 
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B. From dynamical systems to stochastic processes via symbolic dynamics 


The symbolic description of a dynamical system leads naturally to an interpretation of such systems as stochastic 
processes 0D]. Let (M, E, ft) be a measure space with Borel field E and probability measure fj, such that fi : E —► [0,1] 
and = 1. Furthermore, assume that n is the unique ergodic invariant measure under the mapping /, that is 


t 


(Invariance) For every B G £, /x(/ 1 (-£?)) = /x(-B). 


(5) 


(Ergodicity) For every B £ E with / 1 (-B) = B, either /i(B) = 0 or n{B) = 1. 

Given the partitioning defined by Eq. ([2]), the symbol space (alphabet) fi is made of m+ 1 symbols (alphabet letters), 


fi = {0,1,..., m — 1, to}. 


( 6 ) 


We can formally define a random variable S' as a measurable function S : fi — > M with the probabilities given by 

P(s t ) = f Prob(S t = s t ) = »(A S J, Vs t € fi. (7) 

This line of reasoning can be generalized to accommodate joint probabilities of arbitrary finite length, 

P(s t , St+i, St+ 2 , • • • ) = Prob(S t = s t , S t+ i = St+i, S t + 2 = St+ 2 , ■ ■ • ) 

= n r\A St+1 ) n f~ 2 (A St+2 )...). (8) 

The probabilities in Eqs. 0 and ([8]) are time-invariant because /i is invariant as assumed in Eq. 0- Within this 
setting, P(s) denotes the probability that the symbolic state of the system (at any time) is equal s, while P(s, s') is the 
probability of the current and next symbols being s and s', respectively. Therefore, this framework defines a discrete 
stochastic process where the symbolic states are regarded as random variables whose stationary joint distributions 
are determined by Eqs. 0 and 0. We point out that the support of such a stochastic process associated with the 
symbolic dynamics is commonly referred to as a shift space mi¬ 


ni. MARKOV ORDER, CAUSAL STRUCTURE, AND INFERENCE 

For a given symbolization of a dynamical system that originates from a chosen partitioning of the phase space, 
we are interested in defining and identifying a minimal set of past states that encode information about the current 
state St- This will enable us to remove redundant information of the past when making efficient predictions about 


the future. 
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A. Markov Order and Causal Structure of the Symbolic Dynamics 


In view of the probabilistic interpretation of the symbol dynamics, we refer to a partition as Markovian of order k 
if the resulting stochastic process is Markov order k; that is, if the symbolic state only depends on its past k states 
rather than on the entire history. Using the following notations 


S t - A = (S t - 1, St-2,- ••)> 

(9) 

\ def / \ 

St-\ s (t—k)~ — (St— 1) s t— 2, • • • > St—k)i 

a process is Markov order k if and only if the conditional probabilities satisfy 


P(s t \s t -) = P(s t \s t -\s (t _ k) ~) (10) 

for every choice of s t - and no nonnegative integer smaller than k fulfills this requirement. (When k = 0, we call 
the process an i.i.d. process.) In other words, information carried in the past states is all conditionally redundant 
given information about the past k states. On the other hand, there might be further redundancy in the information 
encoded in these k states. In particular, let 


Pt C {t — k, t — k + 1,..., t — 1} 


( 11 ) 


be a minimal set contained in the Markov time indices for which 


P{s t \s t -) = P{s t \s Vt ) 


( 12 ) 


holds for every s t ~. Therefore for every proper subset V' t of Vt Eq. (12) does not hold. We refer to Vt as the set of 


causal time parents of time t. Conditioning on the states with time indices given by Vt , information of all other states 
becomes redundant. The states at time(s) Vt are the only ones that cause the current state, and therefore the set Vt 
defines a causal structure of the symbolic dynamics. This can be viewed as a finer description than the Markov order, 
which in turn allows for a more efficient encoding of the process. Figure [2] illustrates the difference between Markov 
order and causal structure of an example process. 


B. Entropy, Mutual Information, Conditional Mutual Information, and Causation Entropy 

Practical evaluation of joint probabilities is delicate for two reasons: first, numerical imperfections due to finite 
precision of the computing machines are unavoidable; second, when the probabilities need to be estimated from finite 
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St-6 St-5 St-4 St-3 St-2 St-1 St 

Pt={t-4, t-3, t-1} 

FIG. 2: (Color online) An example causal structure of a Markov process. Here the process is of order k = 4, although only 
three (marked in red) out of the four past time indices (enclosed by dashed box) are needed to render the current state (green) 
conditionally independent of the rest of the past. The set of causal time parents of t is therefore Vt = {t — 4, t — 3, t — 1} in 


Eq. (121. 


data samples, estimation errors are inevitable. Naturally, the appearance of such numerical and estimation errors 


will propagate into Eqs. (10) and (12), making it difficult to distinguish equalities from inequalities. These equations 


need to be examined for joint sequences, leading to an overwhelming number of (heuristic) decisions that need to be 
made. This, in turn, renders unreliable the direct determination of Markov order and causal structure based on their 
respective formal definitions. From a statistical standpoint, it is preferable to base such determination on a minimal 
number of equations/decisions. Appropriately defined information-theoretic measures fulfill this goal by collectively 
grouping the joint probabilities, therefore greatly reducing the number of equations/decisions. 

Recall that Shannon entropy is a quantitative measure of the uncertainty of a random variable. For a discrete 
random variable X with probability mass function P{x) = f Prob(X = x ), its entropy is defined as |4l| 

H{X) = -Y,P{x)\ogP{x), (13) 

X 

where log is taken to be base 2 throughout the paper. The mutual information between two random variables X and 
Y is given by [42| 

P{x,y) 


HX;Y) = '£ P(x,y) log- 


(14) 


P{x)P{y)' 

1 V 

Mutual information measures the deviation from independence between X and Y. It is generally nonnegative and 
equals zero if and only if X and Y are independent. Similarly, the conditional mutual information between X and Y 
given Z is defined as FT?| 


I(X-Y\Z) = ^ P(x,y,z)lo g , , 

^ P{x\z)P(y\z) 


x,y,z 


( 15 ) 











and it measures the reduction of uncertainty of X (Y) due to Y (A') given Z. Conditional mutual information is 
nonnegative, and equals zero if and only if X and Y are conditionally independent given Z. 

For a stationary stochastic process {S t } and a given set of time indices J t CT, we propose to define the (temporal) 
causation entropy (CSE) from J t to t to be 


Cj^ t =I(Sj t ;S t \S t -\ Jt ). 


(16) 


Being a conditional mutual information, causation entropy is always nonnegative. It is strictly positive if and only if 
uncertainty about the state St is reduced due to the knowledge about Sj t . This occurs when the past states with time 


indices J t carry information about the current state at time t. We remark that Eq. (16) is an adapted definition of 


causation beyond our previous work |15i - ll7| for a specific scenario, in the sense that direct causality is now intimately 
linked to causation entropy being strictly positive without the need of appropriately choosing the conditioning set. 


C. Inference of Markov Order and Causal Structure 


Based on the definition of Markov in Eq. (10), a stochastic process has Markov order k if and only if 


C{t—k)--¥t — 0 


(17) 


for the smallest possible nonnegative integer k. Algorithmically, we start by examining Eq. (17) for fc = 0. If it holds 


true, then the process is i.i.d. If not, we proceed with k = 1,2,, until the equation is satisfied. The resulting value 
of k is the Markov order of the process. As a side remark, we note that there are other entropy-based approaches to 
determine the Markov order ITol - fTol . 

Now we discuss the inference of causal structure via causation entropy. Given the definition of causal structure in 


Eq. (12), it follows that a Markov process of order k has causal time parents Vt if and only if 


C (t-\v t )^t = 0 


(18) 


where Vt C {t — k, t — k + 1,..., t — 1} and no proper subset of Vt fulfills the equation. Computationally, it is 
generally infeasible to efficiently find the causal structure without additional assumptions about the underlying joint 
distributions. A general assumption, called the faithfulness or stability assumption, requires that the joint effect/cause 
is decomposable into individual components mmm- That is to say, for every t' £ Vt, the contribution measured 
in terms of the conditional mutual information I (St'', StlSgt) is non-vanishing for every Q t that does not include t' 
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or t. Under this assumption, we can show that the causal time parents Vt form the minimal set of time indices that 
maximizes causation entropy US, i.e., 


V t = Q J t , where K. = {J C t : \/K C t , Cj^ t > C K ^ t }- 
JtOC 


(19) 


We refer to Eq. (19) to as the optimal causation entropy principle, which allows the transformation of the causal 
inference problem into a numerical optimization problem. 

Algorithmically, we propose to infer the causal set Vt via a two-stage iterative process, described as follows. The first 
stage, which we call aggregative discovery, starts by finding a time index t'~ which maximizes the mutual information 
I{S t '] S t ) provided that such mutual information is strictly positive. That is, 


h = argmax I{S t ']S t ). (20) 

f- 

Then, at each subsequent step, a new time index is identified among the rest of the indices to maximize the 
conditional mutual information given the previously selected time indices, that is, 


ti+i = argmax /(£*'; S t \S tl ,t 2 ,...,ti)- (21) 

Such iterative process ends when the corresponding maximum conditional mutual information equals zero, and the 
outcome yields a set of time indices Q t = {ii, t %,..., D Vt- 

Then, in the second stage, we progressively remove time indices in Q t that are redundant (i.e., do not belong to 
Vt)- In particular, we enumerate through the time indices in Q t and remove each component ti for which 


I{S tl ]St\S Qt \ tl ) = 0. 


( 22 ) 


Every time a component is removed, the set Qt is updated accordingly. The end of the process is then inferred as 
the set of causal time parents Vt- We remark that the discovery and removal stages of our algorithm are reminiscent 
of the forward selection and backward elimination in regression analysis m- Here, for the purpose of correct and 
consistent inference of Markov order and causal structure, we have adopted conditional mutual information in our 
algorithm. US- 

Two practical considerations need to be taken into account for the inference of Markov order and causal structure. 
First, the history of a variable needs to be truncated, i.e., t~ will be approximated by t~ ss {f — T, t — T + 1,..., t — 1} 
for some T 1 in Eq. (17) (regarding Markov order) and Eqs. (20pl) (regarding causal structure). In particular, 


such truncation leads to a partial fulfillment of both the Markov requirement in Eq. (|T()| and causal structure in 
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Eq. (12). Second, numerical and estimation errors generally render information-theoretic quantities such as the mutual 
information, conditional mutual information, and causation entropy nonzero (and in particular, even negative [4811490 . 
In order to decide whether or not an estimate should be regarded as zero, one needs a threshold-selecting procedure m- 
an estimated quantity smaller than a predefined threshold will be considered vanishing. 


IV. MARKOV ORDER AND CAUSAL STRUCTURE FROM THE SYMBOLIZATION OF TENT MAP 

In this section we provide an application of our theoretical procedure in determining the Markov order and causal 
structure of symbolic dynamics of the tent map. The primary reason why we have chosen the one-dimensional tent map 
as an example is twofold. First, the tent map is simple enough to allow explicit analytical computations of the entropic 
functionals of known probability distributions. Such computations are not only useful for cross-checking numerical 
estimates, they also provide some insights into the information-theoretic measures employed in our investigation. 
Second, regardless of its simple form, the tent map appears to serve as a rich test-bed for the investigation of how 
Markov order and causal structure of a dynamical system are affected by the choice of symbolization. In fact, under 
symbolization, even a ID map such as the tent map can be regarded quite complex from a topological standpoint [27j- 
132] . Finally, we remark that our computational framework can be applied to arbitrary unimodal maps. 


A. Tent Map and Partitioning 


The tent map is a one-dimensional system given by Xt+i = T(x) where T : [0,1] —» [0,1] is defined as 


T(:e) d = 


2x, 


if 0 < x < 2 , 


(23) 


I 2(1 — x), if \ < x < 1. 

Specifically, we shall discuss the manner in which different choices of the partitioning lead to (qualitatively and 
quantitatively) different symbolizations of the original dynamics with specific Markov orders and causal structures. 
For the time being, we limit our investigation to a binary symbolic description of the dynamical map. Consider a 
general binary partitioning of the phase space defined by the parameter a £ (0,1), so that 


“4 — {An A} — {[0, a), (a, 1]}. 


(24) 


Such partitioning allows us to represent a continuous trajectory by a sequence of binary symbols (bits). We remark 
that the choice of a = 0.5 leads to a generating partition which gives rise to a symbolic dynamics that is topologically 
equivalent to the original system mi [in mi- 
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Graphical depiction of joint probabilities by preimages of the tent map 


level 0: ** 


a 


1 

level 1: 

a/2 



1-a/2 

level 2: 

a/4 

1/2-a/4 

1/2+a/4 

1-a/4 

level 3: 

a/8 1/4-a/8 

1/4+a/8 1/2-a/8 

1/2+a/8 3/4-a/8 

3/4+a/8 1-a/8 


FIG. 3: (Color online) Preimages of partitioning intervals of the tent map. The intervals if 1 ' (green) and l) 1 ' (red) are defined 


by Eq. (271 and are shown for levels l = 0,1,2,3 for the choice of a = 0.45. In general, at each level Z, the subintervals start 
from ij 0 ^ and then alternate in between I and l[ a) . The relative ordering of the subintervals across levels can change for 
different values of a, although they remain the same as shown in the picture for all a G (4/9, 4/7). 


B. Invariant Probability Measure and Joint Probabilities 


The unique ergodic invariant measure of the tent map can be found by solving the first equation in (|5j) (also called 
a continuity equation) for each subinterval of [0,1], leading to 


^([a, 6]) = b — a. 


(25) 


This immediately gives P(0) = a and P(l) = 1 — a. From Eq. ([8]), the joint probability of an arbitrary sequence of 
length n + 1 is determined by 


Ii Sl) > 


(26) 


P(s 0 , si, s 2 , • • •, s n ) = n I Pi H 

\i =o / 

where the intervals are defined by the preimages of [0, a) and (a, 1] as 

l[ 0) {x G [0,1] : T l (x) e [0, a)}, 

ll 1] = f {x G [0,1] : T l (x)G(a, 1]} . 

In other words, the initial conditions corresponding to a specific symbolic string of length n are formed by a finite 
disjoint union of intervals. Figure [ 3 ] shows an example of these intervals {/|°\/ ; ^} for a = 0.45 and four levels 


(27) 


l = 0 , 1 , 2 , 3 . 


This offers a computationally feasible description with which joint probabilities can be calculated. From Eq. (261, 


we obtain that for n = 0, P(sq) = giving P(0) = a and P(l) = 1 — a as expected. For n = 1, we have 
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P(s 0 ,si) = n l[ Sl) ). This gives the probabilities P(0,0) = a/2, P{ 0,1) = a/2, P(1,0) = 1 — 3a/2, and 

P(l, 1) = a/2 for all a < 2/3 (see also Fig. [3]). For general values of n, we proceed as follows. First, we define the 
level-i preimages of a to be {a^ 1 } (* = 1 , 2 ,..., 2 l ), which are the roots of the equation 


T\x) - a = 0. 


(28) 


For convenience, we sort {aj l) } in the ascending order of i and, additionally, define = f 0 and a| 2 +1 ' ) = f 1. Then, 


the preimages sets of [0, a) and (a, 1] as introduced in Eq. (27) can be explicitly computed as (for every l > 1) 


r (0) | ,2 i —1 / , , 

P = Ui=o («/ 


(2 i) f2i+lb 

i=0 i a l ) 


(29) 


r(i) 


= UL‘(d 2,| - 1 .d 2 ‘ l ). 


Such preimages sets are subsequently used to calculate joint probabilities. Note that for symbolic strings of length 
n, both the total number of joint probabilities and the total number of intervals contributing to these probabilities 
equal 2". 

Note that the joint probability of the symbol sequence P{sq, Si, S 2 ,..., s n ) depends on the particular choice of the 
partitioning point a. However, the functional a-dependences of such probabilities remain the same for all a values 
within intervals determined by the 2™ distinct roots {a:* } of the equation T n (x) — x = 0, given by 


t i/(2 n -l), i = 0,2,..., 2 n —2; 

«Pj)c - 


(30) 


(* + l)/(2" + l), * = 1,3,..., 2” — 1. 

We emphasize that although the analytical expressions derived above are specialized to the tent map, the proposed 
procedure is, in general, suitable for the computation of joint probabilities of arbitrary unimodal maps m- 

C. Markov Order 


We numerically investigate the Markov order of the stochastic processes arising from the symbolic dynamics of 


the tent map. Recall from Eqs. (10) and that the Markov order can be determined as the smallest nonnegative 


integer k for which the causation entropy vanishes. The Markov order reveals the length of the history 

that carries unique information about the present symbolic state of the system. 

Figure |4] shows the causation entropy as a function of k for a few choices of the partitioning point a 

with values equal to 0.444, 0.47, 0.5, and 0.516, respectively. For each a, the causation entropy decreases in k. Such 
monotonic dependence of k is in fact of general validity since for every k < k', the difference of causation entropies 
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k k 


FIG. 4: (Color online) Numerical determination of Markov order from causation entropy. The curves show numerically computed 
causation entropy (a) and mutual information I (St', St- 1 ,..., St-k+i, St-k)( b) as functions of k for various choices 

of a. The results imply that the Markov order of the symbolic dynamics of the tent map equals 3 (a = 0.444), 4 (a = 0.47), 
0 (a = 0.5), and 5 (a = 0.516), respectively. In the numerical calculations, we approximate t~ by its finite truncation 
(t- 15, t— 14,..., t — 1). 

can be expressed in terms of a conditional mutual information, which is nonnegative. On 
the other hand, the mutual information I(St;St-k , St.-k+i,---, St- 1 ) generally increases in k and saturates when the 
causation entropy reaches zero. Results shown in Fig. [4] suggest that Markov orders can be different upon different 
choices of the partition point, yielding k = 3 for a = 0.444, k = 4 for a = 0.47, k = 0 for a = 0.5, and k = 5 for 
a = 0.516, respectively. Such difference is remarkable given the relative small differences in the values of a. 

How does the Markov order depend on the partition point a in general? We address this question by computing 

the causation entropy C^ t -k)--n in Eq. ( fl7| ) as a function of a for a range of k values, k = 0,1, 2,- The results 

are shown in Fig. [5] Visually, the symbolic dynamics achieves Markov order k at the values of a for which all curves 
beyond the (k — l)-th one reach zero. For example, Fig. [5] confirms the same Markov orders for the a values as shown 
in Fig. [4j Interestingly, the Markov order seems to depend sensitively on the choice of partitioning: a tiny bit of 
change in a generally results in a (large) change in the Markov order. This behavior is evident from the non-smooth 
and fractal appearance of the curves in Fig. [5] and, from the seemingly erratic manner in which they overlap and 
collapse. 

Having explored the influence of the location of the partition point a, we ask: how do partition refinements affect 
the Markov order? We now extend our investigation to non-binary symbolic descriptions of the tent map. Consider 
a map refinement of a given partitioning A = {Aq,Ai, ... ,A m } [], which is given by the intersection of the original 
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FIG. 5: (Color online) Causation entropies for the symbolic states of the tent map. Causation entropies (for 

k = 1,2,... ,7) are computed and shown for a range of a values: a £ (0, l)(a) and a £ (0.43,0.52)(b). Vertical dashed lines in 
panel (b) mark four specific choices of a: 0.444, 0.47, 0.5, and 0.516, respectively. A grayscale bar is shown below each plotting 
panel to visualize the numerically determined Markov order as a function of a, where a darker color corresponds to a higher 
Markov order (white corresponds to order 0). For each a, the Markov order is numerically determined as the smallest integer 
k such that C( t _ *)--« < 10 - 3 H(a). 


partition elements and their preimages under /, as 


TZ(A) = f {f~ 1 (A i ) D Aj}™ j=0 . 


(31) 


Inspecting Eq. © and the definition of Markov order given by Eq. ( fl0| ), we conclude that if the Markov order resulting 
from the original partition A is k, then the Markov order upon the map-refinement partition 1Z(A) equals fc — 1 if 
k > 1, and is less or equal to 1 if k < 1 (see proof in the Appendix). This result is numerically confirmed in Fig. [6ja) 
for the tent map. In particular, for the original partition point a = 0.5, the Markov order equals 0 and map refinement 
increases it by 1 while further map refinement does not change the order. On the other hand, for a = 0.444 which 
yields Markov order 3, each map refinement decreases its order by 1 until the order reaches 1. Interestingly, the same 
does not hold true for arbitrary refinements of the partition. Fig. ©b) shows that a general refinement can either 
increase, decrease, or maintain the Markov order of the resulting process. There seems to be no predicable pattern 
for which the Markov order changes upon arbitrary refinement. This behavior is further explored in Fig. [6jc), which 
shows that for a specific initial partition (here a = 0.444), different locations of the new partition point generally 
result in different Markov orders. Once again, such behavior appears in an irregular pattern. 
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FIG. 6: (Color online) Markov order upon map-refinements (a) and arbitrary refinements (b)-(c). In panels (a)-(b) the partition 
points are shown whereas in panel (c) the initial partition point is fixed at 0.444 while the new partition point varies from 0 to 
1. In all calculations, we truncated t~ as ( t — 15, £ — 14,. .., t — 1). A grayscale bar in the bottom of (c) shows the numerically 
computed Markov order as a function of a, where a darker color corresponds to a higher Markov order (white corresponds to 
order 0). For each a, the corresponding Markov order is computed as the smallest integer k for which < 10~ 3 H (a). 

D. Causal Structure 

Finally, we turn to the causal structure of a symbolic dynamics, which provides a description of the process finer 
than the Markov order. Unlike the Markov order, causal structure quantifies the minimal amount of the past history 
that is needed to mitigate the uncertainty about the present symbolic state. 

For the tent map, the uncertainty of the symbolic state as measured by the entropy H(St) achieves its maximum 
at a = 0.5. Including information of past states generally reduces the uncertainty, as shown in Fig. [7|)a) , except at 
a = 0.5, which is in fact a point for which the symbolic dynamics is topologically conjugate (equivalent) to the original 
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FIG. 7: (Color online) Uncertainty quantification of symbolic states of the tent map. (Conditional) entropies 
H (St | , St-k+i, St-k ) for values of k = 0,1,..., 5, for the entire range of a £ (0, l)(a) and a subrange a £ (0.43,0.52)(b). 

Vertical dashed lines in both panels mark four specific choices of or. 0.444, 0.47, 0.5, and 0.516, respectively. 

one. The fact that the a = 0.5 partition creates an i.i.d. process is interesting because from the dynamic equation of 
the system, states that are adjacent in time are intimately linked and expected to be causally related. An important 
conclusive message here is the following: partitioning of the phase space that results in a symbolic dynamics that is 
equivalent to the original dynamics can in fact yield a causal structure which differs significantly from that inferred 
from the form of the equations of the original system. 

Recall that a process is Markov of order k if no further reduction is possible beyond the fc-th past state. However, 
the extent to which uncertainty is reduced does not need to be monotonic in time indices. In other words, the 
immediate past does not necessarily encode the most amount of information about the present state. In fact, for 
several values a (e.g., a = 0.444 and a = 0.47), the difference between conditional entropy H(S t \S t -k,...,t-i) for 
consecutive fc’s is not monotonically decreasing in k [Fig. [7})b)] , vertical spacing between curves). Applying the oCSE 
algorithms to infer the causal structure for these a values, we confirmed the Markov order previously computed, and 
more importantly, found that the relative importance of past time states are ordered in a non-monotonic manner, 
namely (t — 2, t — 3, t — 1) for a = 0.444 (Markov order k = 3) and (t — 3, t — 2, t — 1, t — 4) for a = 0.47 (Markov order 
k = 4). We examine all values of a in the interval [0,1] in a uniform manner: {0, 0.001, 0.002,..., 0.999,1}, using a 
threshold value of 10 _3 F(o:) for the causation entropy at the given a. The results are shown in Fig. [8j In particular, 
we found several examples for which the Markov order satisfies k < 6 while the number of causal parents is strictly 
less than k (i.e., certain Markov time indices are skipped in the causal structure). 
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FIG. 8: (Color online) Causal structures from the symbolic dynamics of the tent map when the partition point a is chosen 
from 0, 0.001,0.002,0.999,1. For each a we distinguish the first causal parent computed from the forward (aggregative 
discovery) step of the oCSE algorithm (light red), all causal parents of t from the set {f — 1, t — 2,..., t — 6} (gray), and 
noncausal components (black). In all computations we used a threshold 10 ~ 3 H(a) under which causation entropy is regarded 
as zero. 


V. SUMMARY AND FINAL REMARKS 

Symbolization is a common practice in data analysis: in the field of dynamical systems, it bridges topological 
dynamics and stochastic processes through partitioning/symbolization of the phase space; in causality inference, 
it allows for the description of continuous random variables by discrete ones. Symbolized data, in turn, are not as 
demanding in terms of precision and are often considered more robust with respect to parameters and noise unuiiussi- 

Motivated by the problem of uncovering causal structures from finite, discrete data, we investigated the symbol¬ 
ization of outputs from a simple dynamical system, namely the tent map. We provided a full description of the joint 
probabilities occurring from partitioning/symbolization of the phase space and investigated how Markov order and 
causal structure can be determined from these probabilities in terms of causation entropy, an information-theoretical 
measure. We found that in general, partitioning of the phase space strongly influences the Markov order and causal 
structure of the resulting stochastic process in an irregular manner which is difficult to classify and predict. In par¬ 
ticular, a small change in the partition can lead to relatively large and unexpected changes in the resulting Markov 
order and causal structure. To the best of our knowledge, this is the first attempt in the literature that aims at 
unravelling the intricate dependence of inferred causal structures of dynamical systems on their different symbolic 
descriptions analyzed in an information-theoretic setting. Furthermore, although the effects of map refinements are 
well understood, it remains a main challenge to discover the exact consequences of arbitrary refinements. Especially 
for this reason, we have left the application of our approach to more complex dynamical systems and/or experimental 
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time-series data to future investigations. 

On a different perspective, we note that although finding partitions that preserve dynamical invariants (i.e., gen¬ 
erating partitions) are known to be a real challenge especially for high-dimensional systems [51H53] . it is yet un¬ 
clear whether or not such challenge remains when considering partitions that maintain Markov order and/or causal 
structure. This venue of research can be especially interesting to explore given recent advances in many different 
perspectives on partitioning the phase space including adaptive binning j20i , ranking and permutation of variables 
EJH23 EU, and nearest-neighbor statistics |?) . 

Finally, we remark that the non-uniqueness of symbolic descriptions of a system implies that important concepts 
such as the Markov order and causal structure are not necessarily absolute concepts: rather, they unavoidably depend 
on the observational process, just like classical relativity of motion and quantum entanglement [59) . This, in turn, 
suggests the possibility of the causal structure of the very same system to be perceived differently, even given unlimited 
amount of data. The concept of causality, therefore, is observer-dependent. 
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Appendix A: Monotonic Dependence of Markov Order on Map Refinements 

We will prove that for a transformation / that has a uniquely ergodic invariant probability measure /i, the Markov 
order of the stochastic process resulting from a partition A of the phase space decreases strictly by one under a map 
refinement of the partition unless the original Markov order is less or equal to one. 

Definition: Markov order of a partition. Consider a measure-preserving transformation / : M —> M on a 
compact metric space with a uniquely ergodic invariant probability measure /i [60], Let A = {Ai}™ 0 be a measurable 
partition of the phase space that yields a stochastic process with time-invariant joint probabilities 

P(s t = i t , s t -! = i t -!,...,s t -e = i t -i) d = n(A it _ e n f~ 1 {A it _ l+1 ) ■ ■ -nf^ e {A it )) . (Al) 

If such a process is Markov of order k, we define the Markov order of the partition to be k. 

Remark: In the definition, the uniqueness of the invariant measure implies ergodicity and ensures the well-definiteness 
of the joint probabilities [6(X. 


19 


Definition: map refinement. Consider a measure-preserving transformation / : M — > M with a probability 
measure /r. The map refinement of a given measurable partition A = {Aj}Tf 0 is defined as the partition 

TZ(A) d A f f~\A) WA = {f~ 1 2 3 4 5 (A i ) n Aj}™ j=0 . (A2) 

Theorem (Markov order upon map refinement.) Consider a measure-preserving transformation / : M —> M 
on a compact metric space with a uniquely ergodic invariant probability measure /i. Let A = be a partition 

of M and 71(A) be its map refinement. Suppose that the Markov order of A and 71(A) are k and k, respectively, ft 
follows that k < 1 for k < 1, and k = k — 1 when k > 1. 

Proof. We shall denote the probabilities resulting from the map refinement of A as 


P (st (it, jt), St— 1 (ft— 1; jt— l); * • * , &t— I (N —£7 jt—t)) 


— t (a i t _ e ,j t _ t n f (Ai t _ e+ ij t _ e+1 ) ■ ■ ■ n / (Ai t ,j t )J 


(A3) 


where A lt j = f x (Ai) fl Aj. Since every sequence {s*} is determined by some orbit {xt} of / under the partition 
71(A) , it follows that s t = (it,jt) if and only if x t £ f~ 1 (A it ) fl Aj t . On the other hand, x t = f(x t -i) implies that 


Xt £ Aj t _ 1 . Therefore jt = it -i in Eq. (A3) and 


P (st — ( it,jt ), s t ~ i — (i t _i,jt-i), ■ • • > st-e+i — (it-e+i, jt-e+i)) 
P(st it, Sf— i — it —ii • • • j St—£ — it—/?) 


(A4) 


for all sequences (it,it- i, ■ ■ ■) with nonvanishing probability. Then, the Theorem follows from applying Eq. (A4) to 


the definition of Markov order given in Eq. (10) rewritten using the product rule (chain rule) of conditional probability. 
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